Outlier Detection from a Mixture Distribution When Training Data Are Unlabeled
نویسندگان
چکیده
We consider the difficult task of using seismic signals (or any other discriminants) for detecting nuclear explosions from the large number of background signals such as earthquakes and mining blasts. Given a ground-truth database (i.e., labeled data), Fisk et aL (1996) consider the problem of detecting outliers (nuclear explosions) from a single background-signal population, and their approach has been applied successfully in several regions around the world. Wang et al. (1997) attack the problem in terms of modeling the background as a mixture distribution and looking for outliers (nuclear events) from that mixture. However, those authors only considered the case in which at least some fraction of the training sample was labeled, that is, at least some ground-truth information was available, and the number of distinct classes of events was known. In the current article, we extend these results to the case in which no events in the training sample are labeled and also to the case in which the number of event types represented in the training sample is unknown. One can view the mixture approach as a robust method for fitting a density to training data that may not be normally distributed whether or not the data consist of identifiable components that have a physical interpretation. The technique is demonstrated using simulated data as well as two sets of seismic data.
منابع مشابه
Detecting Suspicious Card Transactions in unlabeled data of bank Using Outlier Detection Techniqes
With the advancement of technology, the use of ATM and credit cards are increased. Cyber fraud and theft are the kinds of threat which result in using these Technologies. It is therefore inevitable to use fraud detection algorithms to prevent fraudulent use of bank cards. Credit card fraud can be thought of as a form of identity theft that consists of an unauthorized access to another person's ...
متن کاملA New Approximation for the Null Distribution of the Likelihood Ratio Test Statistics for k Outliers in a Normal Sample
Usually when performing a statistical test or estimation procedure, we assume the data are all observations of i.i.d. random variables, often from a normal distribution. Sometimes, however, we notice in a sample one or more observations that stand out from the crowd. These observation(s) are commonly called outlier(s). Outlier tests are more formal procedures which have been developed for detec...
متن کاملA statistical test for outlier identification in data envelopment analysis
In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the prese...
متن کاملApplication of Recursive Least Squares to Efficient Blunder Detection in Linear Models
In many geodetic applications a large number of observations are being measured to estimate the unknown parameters. The unbiasedness property of the estimated parameters is only ensured if there is no bias (e.g. systematic effect) or falsifying observations, which are also known as outliers. One of the most important steps towards obtaining a coherent analysis for the parameter estimation is th...
متن کاملStatistical Techniques in Anomaly Intrusion Detection System
In this paper, we analyze an anomaly based intrusion detection system (IDS) for outlier detection in hardware profile using statistical techniques: Chi-square distribution, Gaussian mixture distribution and Principal component analysis. Anomaly detection based methods can detect new intrusions but they suffer from false alarms. Host based Intrusion Detection Systems (HIDSs) use anomaly detectio...
متن کامل